Goto

Collaborating Authors

 subtitle file


V-SAT: Video Subtitle Annotation Tool

Kundu, Arpita, Chakraborty, Joyita, Desarkar, Anindita, Sen, Aritra, Patil, Srushti Anil, Raman, Vishwanathan

arXiv.org Artificial Intelligence

The surge of audiovisual content on streaming platforms and social media has heightened the demand for accurate and accessible subtitles. However, existing subtitle generation methods primarily speech-based transcription or OCR-based extraction suffer from several shortcomings, including poor synchronization, incorrect or harmful text, inconsistent formatting, inappropriate reading speeds, and the inability to adapt to dynamic audio-visual contexts. Current approaches often address isolated issues, leaving post-editing as a labor-intensive and time-consuming process. In this paper, we introduce V-SAT (Video Subtitle Annotation Tool), a unified framework that automatically detects and corrects a wide range of subtitle quality issues. By combining Large Language Models(LLMs), Vision-Language Models (VLMs), Image Processing, and Automatic Speech Recognition (ASR), V-SAT leverages contextual cues from both audio and video. Subtitle quality improved, with the SUBER score reduced from 9.6 to 3.54 after resolving all language mode issues and F1-scores of ~0.80 for image mode issues. Human-in-the-loop validation ensures high-quality results, providing the first comprehensive solution for robust subtitle annotation.


9 free AI tools that run locally on your PC

PCWorld

It's no coincidence that many programs using artificial intelligence techniques are open source and thus completely free. This is because the early approaches originated in academia, where free licences for software are common practice in order to promote collaboration and further development. Here, however, it is not about frameworks and libraries for forms of AI, but about tangible and useful applications of artificial intelligence for your own computer. The term AI encompasses various methods such as neural networks, machine learning, deep learning, or natural language processing. In the following compilation, all these approaches are represented. The various approaches to pattern recognition, machine-processed decision trees, and automation of tasks are built on training data and models that are already ready. The availability of this data is one of the reasons why useful AI techniques are available in freely available software today at all.


Localize content into multiple languages using AWS machine learning services

#artificialintelligence

Over the last few years, online education platforms have seen an increase in adoption of and an uptick in demand for video-based learnings because it offers an effective medium to engage learners. To expand to international markets and address a culturally and linguistically diverse population, businesses are also looking at diversifying their learning offerings by localizing content into multiple languages. These businesses are looking for reliable and cost-effective ways to solve their localization use cases. Localizing content mainly includes translating original voices into new languages and adding visual aids such as subtitles. Traditionally, this process is cost-prohibitive, manual, and takes a lot of time, including working with localization specialists.


Create video subtitles with Amazon Transcribe using this no-code workflow

#artificialintelligence

Subtitle creation on video content poses challenges no matter how big or small the organization. To address those challenges, Amazon Transcribe has a helpful feature that enables subtitle creation directly within the service. There is no machine learning (ML) or code writing required to get started. This post walks you through setting up a no-code workflow for creating video subtitles using Amazon Transcribe within your Amazon Web Services account. The terms subtitles and closed captions are commonly used interchangeably, and both refer to spoken text displayed on the screen.